Skip to content

Conversation

@Smith-Cruise
Copy link
Contributor

Just looks like forget to use allow_field_id_missing in PartitionFieldFromJson(const nlohmann::json& json, bool allow_field_id_missing)

In v1 iceberg, we should allow field_id is missing.

Signed-off-by: Smith Cruise <[email protected]>
@Smith-Cruise Smith-Cruise changed the title For iceberg v1 PartitionSpec's PartitionField id can be optional Allow PartitionField's field_id is missing in Iceberg v1 Jun 10, 2025
std::vector<PartitionField> fields;
for (const auto& entry_json : partition_spec_json) {
ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json));
ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( can we use something like code below?)

ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json, /*allow_field_id_missing=*/format_version==1));

@Fokko
Copy link
Contributor

Fokko commented Jun 10, 2025

I think we should improve this in another way. I don't think we should set it to -1 when it is missing:

// Partition field id in v1 is not tracked, so we use -1 to indicate that.

Instead, I would follow the same pattern as in Java. Where we assign them incrementally from 1000:

https://github.com/apache/iceberg/blob/7b8bd29ce80bc78ed882f0613f7570e78e325988/core/src/main/java/org/apache/iceberg/PartitionSpecParser.java#L137-L144

This is also suggested by the spec:

image

In V1, it is not allowed to drop or re-order fields, so this will always be consistent:

image

@Smith-Cruise
Copy link
Contributor Author

https://github.com/apache/iceberg/blob/7b8bd29ce80bc78ed882f0613f7570e78e325988/core/src/main/java/org/apache/iceberg/PartitionSpecParser.java#L137-L144

Your idea is correct.

I think I can change signature to PartitionFieldFromJson(const nlohmann::json& json, std::optional<int32_t> next_partition_field_id).

if next_partition_field_id has value, we assign field_id from next_partition_field_id directly.

@Smith-Cruise Smith-Cruise requested a review from mapleFU June 11, 2025 05:47
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think the logic with PartitionFieldFromJson(json, /*allow_field_id_missing=*/true) and current logic is equal, but this LGTM. Also cc @wgtmac

@Smith-Cruise
Copy link
Contributor Author

Generally I think the logic with PartitionFieldFromJson(json, /*allow_field_id_missing=*/true) and current logic is equal, but this LGTM. Also cc @wgtmac

Just ensure that field_id is always valid when PartitionField is created.

@wgtmac
Copy link
Member

wgtmac commented Jun 13, 2025

I believe @Fokko's idea has already been implemented as in https://github.com/apache/iceberg-cpp/blob/main/src/iceberg/json_internal.cc#L1058-L1061. Actually my intention is that json_internal.h/.cc are not public and -1 is used as an internal indicator for this special logic. So I agree with @mapleFU that the current PR is literally the same thing.

@Smith-Cruise
Copy link
Contributor Author

I believe @Fokko's idea has already been implemented as in https://github.com/apache/iceberg-cpp/blob/main/src/iceberg/json_internal.cc#L1058-L1061. Actually my intention is that json_internal.h/.cc are not public and -1 is used as an internal indicator for this special logic. So I agree with @mapleFU that the current PR is literally the same thing.

Or just using ICEBERG_ASSIGN_OR_RAISE(auto field, PartitionFieldFromJson(entry_json, /*allow_field_id_missing=*/format_version==1));.

@Smith-Cruise
Copy link
Contributor Author

@wgtmac CI run failed, could you help me to rerun it?

@Fokko
Copy link
Contributor

Fokko commented Jun 18, 2025

I believe @Fokko's idea has already been implemented

Nice, that's great to hear! @Smith-Cruise Could you fix the CI? :)

@Smith-Cruise
Copy link
Contributor Author

I believe @Fokko's idea has already been implemented

Nice, that's great to hear! @Smith-Cruise Could you fix the CI? :)

It just looks like a network issue; I think retrying can fix it.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Smith-Cruise!

Sorry I don't have permission to rerun CIs. You may just push again to re-trigger or wait for @Fokko to rerun it.

BTW, could you modify the PR title to apply conventional commit message? For example, fix: allow PartitionField's field_id to be missing in Iceberg v1

Signed-off-by: Smith Cruise <[email protected]>
@Smith-Cruise Smith-Cruise changed the title Allow PartitionField's field_id is missing in Iceberg v1 fix: allow PartitionField's field_id to be missing in Iceberg v1 Jun 19, 2025
@Fokko Fokko merged commit 2eab0b8 into apache:main Jun 19, 2025
7 checks passed
@Fokko
Copy link
Contributor

Fokko commented Jun 19, 2025

@Smith-Cruise Thanks for fixing this! 🙌 and thanks for the review @mapleFU and @wgtmac

lishuxu pushed a commit to lishuxu/iceberg-cpp that referenced this pull request Jul 5, 2025
…che#121)

Just looks like forget to use `allow_field_id_missing` in
`PartitionFieldFromJson(const nlohmann::json& json, bool
allow_field_id_missing)`

In v1 iceberg, we should allow field_id is missing.

---------

Signed-off-by: Smith Cruise <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants